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Abstract 

Shannon's fundamental bound for perfect secrecy says that the entropy of the secret message 
cannot be larger than the entropy of the secret key initially shared by the sender and the legitimate 
receiver. Massey gave an information theoretic proof of this result, however this proof does not require 
independence of the key and ciphertext. By further assuming independence, we obtain a tighter lower 
bound, namely that the key entropy is not less than the logarithm of the message sample size in any 
cipher achieving perfect secrecy, even if the source distribution is fixed. The same bound also applies 
to the entropy of the ciphertext. The bounds still hold if the secret message has been compressed before 
encryption. 

This paper also illustrates that the lower bound only gives the minimum size of the pre-shared 
secret key. When a cipher system is used multiple times, this is no longer a reasonable measure for the 
portion of key consumed in each round. Instead, this paper proposes and justifies a new measure for 
key consumption rate. The existence of a fundamental tradeoff between the expected key consumption 
and the number of channel uses for conveying a ciphertext is shown. Optimal and nearly optimal secure 
codes are designed. 
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I. Introduction 

Cipher systems with perfect secrecy were studied by Shannon in his seminal paper [IJ (see 
also [T|). With reference to Figure [H a cipher system is defined by three components: a source 
message U, a ciphertext X and a key R. The key is secret common randomness shared by the 
sender and the legitimate receiver. The sender encrypts the message U, together with the key R, 
into the ciphertext X. This ciphertext will be transmitted to the legitimate receiver via a public 
channel. A cipher system is perfectly secure, or equivalently, satisfies a perfect secrecy constraint 
if the message U and the ciphertext X are statistically independent, I(U;X) = 0. In this case, 
an adversary who eavesdrops on the public channel and learns X (but does not have R) will 
not be able to infer any information about the message U. On the other hand, the legitimate 
receiver decrypts the message U from the received ciphertext X together with the secret key R. 
A cipher system is error-free (i.e., the probability of decoding error is zero) if H{U \ XR) = 0. 
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Fig. 1. A cipher system. 



By considering a deterministic cipher, where X is a deterministic function of R and U, 
Shannon showed that the number of messages is equal to the number of possible ciphertexts, 
and that the number of different keys is not less than the number of messages HI p. 681], 

\x\ = \u\ < |7^|, 

where X, U and IZ are the respective supports of X, U and R. In order to design a perfectly 
secure cipher system protecting a source with unknown source distribution Pu, Shannon argued 
that 

H{R)>\og\U\>H{U). (1) 
He also made an important observation fT, p. 682] that 
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''the amount of of uncertainty we can introduce into the solution cannot be greater 
than the key uncertainty" 
In other words, 

H{R) > H{U). (2) 

Massey [2l| called Q Shannon's fundamental bound for perfect secrecy, and gave an information 
theoretic proof for this result. It is important to note that Massey's proof [2] does not require U 
and R to be statistically independent. 

Now, suppose U and R are indeed independent (which is common in practice). Our first main 
result. Theorem [U improves Q, showing that for any source distribution Pu, 

Pnir) < Vr. (3) 

As a consequence, we prove that for any cipher achieving perfect secrecy, the logarithm of the 
message sample size cannot be larger than the entropy of the secret key, 

H{R)> log \U\. (4) 

Comparing with the first inequality in ([1]), we see that & is valid even if the source distribution 
Pu is fixed and known. 

This paper is based on the model in Fig. [TJ Despite its apparent simplicity, this is the most 
general encoder possible, and covers many interesting special cases. For example, suppose the 
distribution of U is non-uniform. One may expect that the optimal encoder will operate according 
to Fig. [21 by first compressing U and then encrypting the compressed output. 
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Fig. 2. Compression before encryption. 



Roughly speaking, compression converts the source into a sequence of independent and 
identically distributed (i.i.d.) symbols. Theoretically, this can maximize the adversary's decoding 
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error probability in some systems O] Theorem 3]. Practically, the compressed output has a smaller 
file size and hence seem to require less key for encryption. This approach of compression before 
encryption was also proposed by Shannon [1, p. 682]. In fact, Shannon believed that, after 
removing redundancy in the source, 

"a bit of key completely conceals a bit of message information" . 
However a separated compression before encryption model is a special case of our more general 
model in Fig. [T] To certain extent, our model can be viewed as joint compression-encryption 
coding. Naturally, our results also apply to models such as Fig. [2l for which we will later prove 

H{X) > log \U\. 

This result, together with dH), in fact suggest that compression before encryption may not be 
useful if both perfect secrecy and error-free decoding are required. 

Another major contribution of this paper is the introduction of a new concept of expected 
key consumption I{R] UX). Previously in the literature, the amount of key required in a cipher 
system has been measured by the entropy of the common secret key. We will argue in this paper 
that H{R) is only valid for measuring the initial key requirement, by which we mean the amount 
of secret randomness that must be shared between the sender and the legitimate receiver, prior 
to transmission of the ciphertext. Instead, key consumption should be measured by /(i?; UX). 
This new measure offers more insights, and in the second part of this paper, we will design 
efficient cipher system that can be used multiple times, where I{R] UX) is one of the system 
parameters to be optimised. 

Besides expected key consumption, we also want to minimize the number of channel uses 
required to transmit the ciphertext X from the source to the legitimate receiver. Naturally, we can 
encode the ciphertext X using a Huffman code 01. Let A(X) be the codeword length. In this case, 
the expected codeword length E[A(X)] satisfies H{X) < E[A(X)] < H{X) + 1. Note that for 
two random variables X and X', it is possible that H{X) < H{X'), but E[A(X)] > E[A(X)]. 
One example is when Px = (0.3,0.23,0.2,0.17,0.1) and Px' = (0.25,0.25,0.25,0.15,0.1). 
However, we still use H(X) instead of E[A(X)] as a measure for the number of channel uses 
required in a cipher system for two reasons: first, H{X) is a lower bound for E[A(X)] and in 
fact a very good estimate for E[A(X)]; second, the problem itself is more tractable when using 
H{X), instead of E[A(X)]. 
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We will show that there exists a fundamental tradeoff between the expected key consumption 
and the number of channel uses. In fact, if the source distribution is not uniform, then the 
minimum expected key consumption and the minimum number of channel uses cannot be 
simultaneously achieved. We will also show that code design achieving minimum expected key 
consumption depends on whether the source distribution Pjj has irrational probability masses or 
not. Optimal code will be proposed for Pu which has only rational probability masses. 

Organization: In Section |Ill we consider one-shot systems, where there is a single message to 
be securely transmitted. We formalize the system model, and new bounds on H(R) and H(X) 
will be derived. In Section [nil we will consider the case where cipher system is used multiple 
times. New system parameters including I{R; UX) will be defined and justified. Section |IV] 
will focus on two regimes corresponding to minimal expected key consumption and minimal 
number of channel uses. The existence of a fundamental non-trivial tradeoff will be illustrated. 
In Section |Vl the performance of compression-bef ore-encryption will be evaluated. 

Notation. Random variables are denoted by capital letters, e.g. X, and their particular 
realizations are denoted by small letters, x. Supports of random variables are denoted by 
calligraphic letters, X. 

II. Key Requirements for One-Shot Ciphers 

Definition 1 (Error free perfect secrecy system): A cipher system (R,U,X) is called an 
Error-free Perfect-Secrecy (EPS) system if 

I{U;X) = 0, (5) 

H{U I RX) = 0, (6) 

I{U-R) = 0. (7) 

Here, ^ ensures perfect secrecy, via independence of the ciphertext X and source message U. 
An eavesdropper learning X can infer no information about the message U. The constraint Q 
ensures that the receiver can reconstruct U from R and X without error. Finally ^ requires 
that the shared secret key R is independent of the message U. 

The constraints (|5]) and ^ were originally used in |l2l to prove Shannon's fundamental 
bound ^ for perfect secrecy. The only additional constraint in Definition [T] is (|7]). In practice, R 
is usually shared prior to the independent generation of the message U. This is a strong practical 
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motivation for ([7]). Furthermore, Definition [T] admits the general case of probabilistic encoding. 
For the receiver, it is however sufficient to consider deterministic decoding since by U is a 
function of R and X. In other words, there exists a decoding function g such that 

PuRxiu,r,x) = PRxir,x)l{u = g{r,x)}. (8) 



Theorem 1 (Lower bounds on H{X) and H(R)): Let (R, U,X) be an error free prefect se- 
crecy system, satisfying ([5]) - ([7]) according to Definition [TJ and suppose Pu is known. Then 



maxPxix) <\l{\-\ (9) 



and 



maxPnir) < \U\-\ (10) 



where U is the support of the message U. Consequently, 



log\U\<H{X), (11) 

with equality if and only if Px{x) = for all x E X. Also, 

log\U\<H{R), (12) 

with equality if and only if Pr^t-) = \V(\^^ for all r G 7^. If the source distribution is not uniform, 
H{X) and H{R) are strictly greater than H{U). 
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Proof: For any x E X, 

\U\Px{x) = Y,Px{^) (13) 

u 

= Y^P^\u{x\u) (14) 

'sp PuRX {u, r, x) 
^ ^ Pu{u) ^ ^ 

u r:Puuxiu,r,x)>0 

\^ Prx (r, x)l{u = g{r,x)} 



u r:Pujix{u,r,x)>0 

Prx{ 

Pu (g ir,x)) 



y l3^£h± (17) 

r:Pjixir,x)>0 



= 2^ PRxir,x) (18) 

r:P,X-)>o PuRx{g{r,x),r,x) 

= Yl Px\UR{x\g{r,x),r)Pnir) (19) 

r:Piix{r,x)>0 
r:Pjix{r,x)>0 

< 1, (21) 

where ([Hi), ^ and ^ follow from ©, dH]), © and ©, respectively. This establishes dH). 

Let Pb be a uniform distribution with support U. Since Px is always majorizecj] by Pb from 
©, E Theorem 10] shows that 

H (Px) > H {Pb) + D {Pb \\Px) (22) 
>H{Pb) (23) 
= log|W|, (24) 

and hence (fTTl) is verified. Note that fT^i Theorem 10] can still be applied even if X may be 
defined on a countably infinite alphabet. If H{Px) = log|W|, equality in (|23l) holds so that 
Px = Pb- Finally, and ([HI) follow from the symmetric roles of X and in © - ©. ■ 

'a good introduction to majorization theory can be found in f6l. In this proof, we just need the definition of "majorized by" 
which can also be found in (7] Definition 1] 
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Corollary 2: No error free perfect secrecy system can be constructed if the source message 
U has a countably infinite support or a support with unbounded size. 

Proof: Assume in contradiction that an EPS system exists for a source message U Pu 
with countably infinite support, \U\ = oo. Note that (fT3l) - (|2T]) are still valid in this case. 
However, the conclusion that \U\Px{x) < 1 for any x E X contradicts |W| = oo. ■ 

The following three remarks emphasize some of the (perhaps unexpected) consequences of 
Theorem [IJ 

1) One could naturally expect that H(U) is the critical quantity setting a lower bound on 
H{R) and H(X). However, Theorem [U shows that H{R) and H{X) can be arbitrarily 
large, as long as the size of the support of U is also arbitrarily large, even when H(U) is 
small. 

2) One may further expect that log \U\ < H{R) is tight only if the source distribution Pjj is 
unknown. However, (fTTj) and (fT2)) show that fixing Pu does not reduce the lower bounds 
on either the initial key requirement, or the number of channel uses required to convey 
the ciphertext. 

3) If the source message U is defined on a countably infinite alphabet, it is not possible to 
design an error free perfect secrecy system (Corollary [2l). Therefore, if a cipher system is 
required for such a source, at least one of the constraints ([5]) - ([7]) must be relaxed. 

The following example compares Shannon's fundamental bound (O with Theorem [IJ It also 
illustrates that the quantity H{R) is insufficient for determination of the requirements on the 
secret key R. 

Example 1: Suppose Pu = (0.3, 0.3, 0.3, 0.1) so that H{U) = 1.895 bits and log \U\ = 2 bits. 

1) Consider R chosen independently of U according to Pr = (0.4,0.2,0.2,0.2) so that 
H{R) = 1.922 bits and H{U) < H{R) < log|W|. Although Pr satisfies Shannon's 
fundamental bound Q, Theorem [H in particular (fT2l) . shows this choice of key R is 
insufficient to achieve error free perfect secrecy. 

2) Consider Pr = (0.4,0.15,0.15,0.15,0.15) so that H{R) = 2.171 bits and H{U) < 
log \V(\ < H{R). However, this choice of key R is insufficient for error free perfect secrecy, 
since from ([H), max^PR(r) = 0.4 > 0.25 = \U\-\ 

Theorem [T] not only applies to systems of the form shown in Fig. [U (which includes Fig. [2] as 
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a special case), but also to multi-letter variations. For example, we can accumulate n symbols 
from the source (Ui, U2, ■ ■ ■ , Un) and treat these n symbols together as one super-symbol U . It 
is reasonable to consider finite n because practical systems have only finite resources to store 
the super-symbol. Unless the source has some special structure, the distribution of U cannot 
be uniform for any n if the Ui are not uniform. For example, if the source is stationary and 
memoryless, accumulating symbols will only make H{X) and H{R) grow with nlog \U\. 

One may argue that the coding rate of H{X) could be reduced because the sender and receiver 
share the same side information R and /(X; i?) > is possible. In other words, a compressor 
may be appended to the encoder in Fig. [T] in order to reduce the size of the ciphertext. This 
configuration is shown in Fig.[3l However, we cannot simply apply the results from source coding 
with side information here, because the ciphertext still needs to satisfy the security constraint. If 
the new output Y satisfies the perfect secrecy and zero-error constraints, I{U ; Y) = H{U\RY) = 
0, then (i?, f/, Y) in Fig. [3] is simply another EPS system, governed by Theorem [IJ 
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Fig. 3. Compressing the output of an EPS cipher. 



To complete this section, we show that the lower bounds ([TT]) and (fT2l) are simultaneously 
achievable using a one-time pad |[8l. 

Definition 2 (One-time pad): Without lost of generality, let W = {0, . . . , M— 1} be the support 
of U. Let R be independent of U and uniformly distributed in U and let X be generated according 
to the one-time pad as X = (U + R) mod M. Then U can be recovered via {X + R) mod M. 

It is easy to verify that (|5]) - © are satisfied and H(X) = H(R) = logM. Therefore, we 
have proved the following theorem. 

Theorem 3 (Achieving the minimum H{X) and H{R)): Let U be the support of U . The one- 
time pad of Definition |2] is an EPS system achieving H{X) = log \U\ and H(R) = log \U\. 
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III. Multiple Messages and Key Consumption 

In Section |Ill Theorem [3] proved that the one-time pad is "optimal" in the sense that it 
simultaneously minimizes H{X) and H{R). This immediately suggests that the one-time pad 
leaves no room for improvement. However, this conclusion in fact stems from a folk theorem 
that the "required size of the secret key" is measured by the key entropy. The hidden assumption 
behind this folklore is that the cipher system is used only once. In typical practice, a cipher 
system will be used repeatedly for the transmission of multiple messages. 

Consider the following scenario. Suppose an initial secret key R is delivered to the sender 
and the receiver prior to commencement of message transmission. Now, suppose the sender uses 
this key to encrypt a message U, which is then delivered to the receiver over the public channel. 
Clearly, some portion of the secret randomness R has now been used. The central question is as 
follows: Can the sender and receiver continue to securely communicate without first receiving a 
new key? For example, if [/ is a single bit and i? is a 100-bit random key, it is indeed likely that 
another message can be securely transmitted. The natural questions are: What is the maximum 
size of the second message? Alternatively, how much of the key R was consumed in the first 
round of transmission? Below, we will show that when an error free perfect secrecy system is 
used multiple times, the key consumption should not be measured by H[R) but by I{R] UX). 
In fact, with respect to our definitions, we will exhibit systems with key consumption that can 
be made arbitrarily close to H{U). 

The following example illustrates some of the basic ideas which will be elaborated in this 
section. 

Example 2: Suppose the sender and the receiver share a secret key R = {Bi, B2, . . . , Bn}, 
where all of the Bi, i = 1, 2, . . . , n are independent and uniformly distributed over {0, 1}. Let 
Pc/(0) = 0.5 and Pc/(1) = -Pc/(2) = 0.25. Construct a new random variable U' such that 



U' 



(0,5„+i), u = o 

(1.0) , U=l (25) 

(1.1) , U = 2 



where -B„+i is generated by the sender independently of U and R such that -Pb„+i(0) = 
Pi?„+.(1) = 0.5. 
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Let K = {Bi, B2) and X = U' Q) K. Upon receiving X, the receiver can decode U from X 
and K, where K is solely a function of R. In fact, if U = 0, the receiver can further decode 
Bn+i- Let 

(iBs,B,,...,B^), Ue{l,2} 

\{B3,B4,...,Bn,Bn+l), U = 0. 

We refer to R' as the residual secret randomness shared by the sender and the receiver. Note 
that R' may not be a deterministic function of R, as the new shared common randomness can be 
generated by a probabilistic encoder. According to (|25l ). a new random bit is secretly transmitted 
from the sender to the receiver when U = 0. After the system is used once, the expected key 
consumption is therefore given by 

Pu{0) ■ 1 + Pu{l) ■ 2 + Pu{2) ■ 2 = 1.5 = H{U), (26) 

which happens to also equal I(R; UX). It turns out that this is not mere coincidence. 

We now define three parameters whose operational meanings are justified in the rest of this 
section. 

Definition 3: The residual secret randomness of an error free perfect secrecy system is 

H {R I UX) . 

Definition 4: The expected key consumption of an error free perfect secrecy system is 

I{R; UX). 

Definition 5: The excess key consumption of an error free perfect secrecy system is 

I{R-X). 

Roughly speaking, we will show that after an EPS system is used once, H{R \ UX) is the 
amount of remaining key that can be used for encryption of the next message. Since the sender 
and the receiver initially share a quantity H{R) of secret randomness, the key consumption is 
equal to H{R) — H(R\UX) = I{R; UX). We will provide achievable schemes to show that the 
minimal key consumption is H(U) and hence, the excess key consumption is I{R; UX) —H{U) 
which is equal to I{R] X) in an EPS system. 

We first justify Definition [3l Consider the scenario of Fig. |4]in which the sender and receiver 
share a secret key R, and two EPS systems are used sequentially by the sender to securely 



July 10, 2012 



DRAFT 



11 



transmit two (possibly correlated) messages U and V . In the first round, the sender encodes 
the message U into X, which is transmitted to the receiver as described in Section [III In the 
second round, the sender further encodes V (or more generally both U and V) into Y , which 
is then transmitted to the receiver. As before, we require H(U \ RX) = H{V \ RXY) = and 
I{UV; XY) = to ensure zero-error decoding and perfect secrecy. 



R 



P 



X\RU 



V 



Y\RV 



X 



Y 



Fig. 4. Using an error free prefect secrecy system twice. 



Theorem 4 (Justification 1): Consider the two-round error free perfect secrecy system of 
Fig. H If 

I{UV; XY) = H{U I RX) = H{V \ RXY) = 0, (27) 

then the entropy of the second message V conditioning on the first message U is upper bounded 
by the residual secret randomness, 

H{V I U) < H{R I UX). (28) 

Proof: Note that 

H{R I UX) - H{V I U) + I{UV; XY) + H{U \ RX) + HiV \ RXY) 

= I{VY; U I RX) + I{RU; Y \ X) + I{U; X) + H{U \ RXY) + H{R \ UVXY) > 0. 

Together with ^ is verified. ■ 
Theorem |4] implies that the maximum amount of information which can be secretly transmitted 
in the second round is upper bounded by the residual secret randomness H(R \ UX), suggesting 
that H{R\ UX) is indeed measures the amount of key unused in the first round. Equivalently, 
the amount of key that has been consumed in the first round is equal to 

I{R; UX) = H{R) - H{R \ UX). 
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Whereas Theorem |4] justifies the residual key H{R \ UX) as bounding the entropy of the 
second round message, we now offer an alternative justification, showing that the size of the 
key that can be extracted after n uses of an EPS system is about nH(R \ UX). 

Consider generation of a new secret key as shown in Fig. \5\ Suppose a sequence of EPS 
systems {(f/j, -Ri, Xj)}"^^ has been used by a sender and a receiver where (f/j,i?j,Xj) are i.i.d. 
with generic distribution Purx- We use {U,R,X) to denote the generic random variables. In 
order to securely send additional messages, the sender and the receiver aim to establish a new 
secret key S*™ = {Si, . . . , 5*^), where the Si are i.i.d. with generic distribution P5. To generate 
the new key S'^, we assume that the sender can send a secret message A to the receiver. The 
new secret key 5"* will be used to encrypt a second sequence of messages V"^, generating a 
ciphertext sequence such that {(V^, Sj, Fj)}™ ^ is another sequence of EPS systems. 



\ 



Encoder 



A 



S' 



v 



Encoder 



i 



Decoder 



A 



yr 



Decoder 



Fig. 5. Generating a new secret key S'" 



Assume 



/ (?7"X"; I V'^S"') = 
H (5™ I R''U'^X''A) = 
/ {S""; [/"X") = 0. 



(29) 
(30) 
(31) 
(32) 



These assumptions adopted with the following reasoning. We assume in (|29l ) that the new 
message is generated independently of the previous uses of the EPS systems. Also, ( |30l ) 
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holds due to (f/",X") — (V^™, S"™) — forms a Markov chain. The sender and the receiver 
can agree on S"^ without error due to (|3TI) . The justification of (l32l) is given as follows. 

Although {{Ui, Ri, Xj)}"^]^ and {{Vi, Si, ^i)}™ i are individually sequences of EPS systems, it 
is possible that their combination is not secure, J(X",y"; V'"^) > 0. For example, suppose 
f/" and are i.i.d. with uniform distribution and m = n. If S'^ = t/", then using a one-time 
pad, /(y™; r™) = but J(X", F™; f/", T/*") > HiU""). The following theorem shows that joint 
EPS systems satisfying (l29l) - (|32l) are still perfectly secure. 

Theorem 5: Consider two sequences of i.i.d. EPS systems {(f/j, -Rj, Xj)}[L]^ and 
{{Vi, Si, Yi)}"^^ satisfying (|29l ) - (|321 ). Then the joint EPS system is still perfectly secure, 

/ (X'^F'"; f/"\/"') = 0. (33) 

Proof: By assumption 

/([/"; X") = /(V^"^; y") = /(V^"^; ?7", X") = 

/([/", X"; 5™) = 1(5"^; f/", X") = 0. (34) 

Note that 

/ {S"^; U"", X"") + I {V""- S"", f/", X'^) + / {V^- y") + 

/ ([/"; X") + / (f/", X"; I V"^^™) - / (X", F™; t/", V™) 

= / (f/", X"; ^™ I 1/"^, F") + / (\/™; ^"^) + / (X"; F™) + / ([/"; 1/™) > 0. 

Together with dH), /(X", F™; f/", \/™) < 0. Since /(X", F™; f/", V^™) > 0, is verified. ■ 
In order to generate a new key 5™, a secret auxiliary random variable A is sent from the 
sender to the receiver. Here, A is generated by a probabilistic encoder with {{Ri, ?7i,Xj)}"^^ as 
input. In Example [21 above, suppose we wanted to restore n secret bits after the system is used 
once. Then A is a fair bit if f/ = and A consists of two fair bits if f/ = 1 or 2. We measure 
the expected size oi Ahy H{A\ R^U'^X"'). Since we can directly treat A as the new secret 
key S™, it is reasonable to expect that H{S'^) > H {A \ i?"f/"X"). Therefore, it is of interest 
to know by how much H{S''^) can exceed H{A \ _R",f/",X") for a given sequence of EPS 
systems. The following theorem shows that the secret randomness, which can be extracted from 
{{Ri, Ui, Xj)}"^]^ with help from A, is measured by the residual secret randomness H{R \ U, X). 
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Theorem 6 (Justification 2): Consider two sequences of i.i.d. EPS systems {{Ui, Ri, Xi)}^^^ 
and {{Vi, Si, r^)}™ ^ and any A. If (HH) - dill) are satisfied, then 

H (S"") -H{A\ i?"f/"X") <nH{R\ UX) . (35) 

On the other hand, it is possible to generate S™- such that (|29l) - (l32l) are satisfied and 

H [S"^) -H{A\ i^^f/'^X") >nH{R\ UX) - log 2 (36) 
for a sufficiently large m such that 

max Ps^is"^) < mm ^^PRn^unxn (r" | u"',x'^) . 
Proof: We first prove (|351) by showing that 

H [S"") = I (5™; U''X'') + H{S"'\ AR^'U^'X'') + / [S""; AR"" \ U^'X'') (37) 
= / (5™; A/?'^ I [/"X'^) (38) 
< (Ai?" I f/"X") (39) 
= if (A I i?"[/"X") + if (i?" I f/"X") (40) 
= if (A I i?"t/"X") + n /i (i? I f/X) , (41) 

where dMI) follows from dlB - dlH) and (gB follows from the fact that {(?7i, X^)}^^! is a 
sequence of i.i.d. EPS systems. 

The proof of the achievability part in (l36l) is via construction. With reference to Fig.[6l consider 
two partitions of the unit interval into disjoint "cells". The width of cell i in the first partition is 
Psm{i) for 1 < i < where S is the support of Si. Consider f/" = m" and X" = x". The 

width of cell i in the second partition is Pr^^\u"^x" {i \ u"", a^") for 1 < ? < |7?.|". The distribution 
of A is constructed to divide the second partition as shown in Fig. [6l To simplify notations, we 
consider the support of i?" to be a set of consecutive integers {1, |7^'|} when [/" = m" and 
X^ = x". Suppose R^ = r and let 

b = max I J ■■J2^s{^)>Yl Pr-\u-,x- I | • (42) 

I i=l i=l ) 

For j > 1, A is defined by PaH) = s ""^^l , n n\ , where 

j (b+j-l T \ r-1 

^ a{%) =mm\ ^ Psii), ^ Pr^iu'^x^ {i \ u"", a:") > - J]] Pr^\U",x-^ {i \ x"") . (43) 

i=l I i=l 1=1 J j=l 
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Fig. 6. An assignment of A. 



For the example in Fig. [6l when i?" = 1, 



P^(l) = ^ = 1 _ p^(2). (44) 



When i?" = 2, 



Pa 1 = ^ ^ ^ ^ , ^, ' ' — = 1 - Pa 2 . (45) 

By definition S"™ is determined from P" and A for any fixed t/" = m" and X" = x". On the 
other hand, A is also determined from 5™ and P". Therefore, 



H {S"^ I A, P'^, f/", X") = P (A I 5"*, P'^, U"", X") = 0. (46) 
By choosing m sufficiently large, such that 

max Ps^is"") < min^PRn|c;n,x" K | u",x") , (47) 
R"^ can take at most two possible values for any given (S'™, f/",X") and hence 

H (P" I 5™, [/", X") < log 2. (48) 
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Therefore, 



H {A I i?"f/"X") 



(49) 



(50) 



/ {A- 5" I i?"f/"X") + (5"^ I v4i?"f/"X") 



(51) 



(52) 



(^™) -H{K^\ ?7"X") + H{B!'\ 5'^f/"X") - /(S*"; f/", X") 



(53) 



(54) 



< (S'") - (i?" I [/"X") + log 2 



(55) 



where (|5T| ) and (1551 ) follow from (|46l ) and (|48l) . respectively. Since {{Ui, Ri, -^i)}"=i is a sequence 
if i.i.d. EPS systems, (l36l) is verified. 

For any ([/"jX*^), the same -P5™|(7",x" = -Ps™ is generated. Therefore, (|32|) is verified. Since 
S**" is determined by (i?*^, [/", X*^, A), (|3T| ) is verified, and (|29l ) can also be verified as l^™ is 
independent of (i?", [/", X", A). Finally, dBO]) is due to the fact that {{Vi, Si, l^i)}^! is a sequence 
of EPS systems. ■ 

Roughly speaking. Theorem [6] shows that for large n and m, the optimal algorithm with the 
help of A can extract approximately 



bits of residual secret randomness from {{Rt, Ui, Xj)}"^^ In f9], we considered another algorithm 
generating a new secret key with asymptotic rate H (R \ UX) without using an auxiliary secret 
random variable. As the sender and receiver initially share nH{R) bits of secret randomness, 
the expected key consumption for each use of the EPS system is 



the quantity proposed in Definition |4l Next, we exhibit an important property of I{R; UX). 

Theorem 7: In an error free perfect secrecy system, the expected key consumption is lower 
bounded by the source entropy. 



nH {R I UX) 



H{R) - H{R I UX) = I{R; UX) 



I{R] UX) > H{U), 



(56) 



where equality holds if and only if I{R; X) = 0. 
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Proof: The information diagram for the random variables f/, X, R involved in an error free 
perfect secrecy system satisfying ([5]) - dV]) is shown in Fig. |7(a)[ It is easy to verify that 



I{X-R) =I{R; UX)-H{U). 



(57) 



Since /(X; R) > 0, Theorem |7] is proved. 




(a) General EPS system 



Fig. 7. Information diagrams. 



(b) Minimum expected key consumption, 
achieving equality in i56\ 



In Section ITV-A[ we will describe several EPS coding schemes achieving I(R] UX) = H{U). 
Therefore, I{X; R) measures the difference between the expected key consumption of an 
EPS system and the minimum possible key consumption, again justifying Definition [51 The 
information diagram for the optimal case I{X; R) =0 is shown in Fig. |7(b)[ 

We summarize this section in the following three remarks. 

1) Theorems m and [6] provide strong justification of I{R] UX) as the expected key consump- 
tion required to achieve error free perfect secrecy. Theorem |7] shows that the expected 
key consumption cannot be less than the source entropy. Recall that Theorem [U gives the 
lower bound on the initial key requirement. Therefore, we have distinguished between 
two different concepts (a) expected key consumption in a multi-round system and (b) the 
initial key requirement for a one-shot system. In contrast to the bound H(R) > H(U) [IJ, 
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Il2l . Theorem |7] more precisely describes the role of H{U) in an error free perfect secrecy 
system. 

2) From ([S])-© we can show that 

H{R) = H{U) + I{X;R) + H{R\UX). (58) 

Thus the key entropy H{R) consists of three parts: the randomness used to protect the 
source, the excess key consumption and the residual secret randomness. 

3) If the source distribution is uniform, Example [3] below shows that the one-time pad achieves 
minimal key consumption. 

Example 3 (Uniform source distribution): Suppose U and R are independent and are uni- 
formly distributed on the sets {0, 1, . . . , 2' — 1} and {0, 1, ... , 2^ — 1}, respectively, where i < j. 
In order to derive a coding system satisfying ^ - ([7]), we can first extract i random bits R' 
from R and construct X as the modulo-two addition of the binary representation of U and R'. 
Then 

I{R; UX) = H{R) - H{R \ UX) (59) 
= H{R) - H{R I R!) (60) 

= J-(j-^) (61) 
= H{U). (62) 

IV. Tradeoff between Key Consumption and Number of Channel Uses 

Example [3] shows that the one-time pad simultaneously achieves the minimal expected key 
consumption and the minimum number of channel uses for a uniform source. However for 
general non-uniform sources, we will show that there is a non-trivial tradeoff between these two 
quantities. 

We will consider two important regimes. First, in Section IIV-A[ we will consider the regime 
in which ciphers minimize the key consumption I{R;UX). Conversely, in Section HV-BI we 
consider systems which minimize the number of channel uses H{X). 

We shall demonstrate the existence of a fundamental, non-trivial tradeoff between the expected 
key consumption and the number of channel uses. Our main results. Theorem [T] proved earlier, 
and Theorems |7]-[l4]to be proved below, are summarized in Fig. [8l 
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Point 1 is due to Theorem [14] in Section ITV-BI below, and has the smallest I(R; UX) among 
all EPS systems with H{X) = log \U\. We shall show that this point can always be achieved by 
one-time pad. 

Point 2 has the smallest H{X) among all the EPS systems with /(i?; UX) = H{U). For this 
point, Theorem [H] in Section IIV-AI gives the lower bound on H{X) which is strictly greater 
than log \U\ if Pu is not uniform. 

If Pu has only rational probability masses, Theorem [8] in Section ITV-Al below shows that Point 
3 can be achieved by a generalization of the one-time pad, the partition code (to be introduced 
in Definition [6]). 

If all the probabilities masses in Pu are the integer multiples of the smallest probability mass 
in Pu, then Point 2 coincides with Point 3 by the partition code shown in Theorem [T3l Otherwise, 
Point 3 can differ from Point 2 which will be demonstrated in Example IH 

The existence, continuity and non-increasing in H{X) properties of the curved portion of the 
tradeoff curve are established in Section IIV-CI 



I{R; UX) 



1 



log \U 




H{U) 



2 



3 

« 



► H{X) 



log \U 



log 9 



Fig. 8. Tradeoff between I(R- UX) and H{X). 



July 10, 2012 



DRAFT 



20 



A. Minimal expected key consumption 

We first consider EPS systems which achieve minimal expected key consumption. From 
Theorem |71 an error free perfect secrecy system with minimal key consumption satisfies Q-© 
and 

/(X; R) = 0. (63) 

We now generalize the one-time pad to achieve minimal key consumption for source distributions 
containing only rational probability masses. 

Definition 6 (Partition Code Ci^^)): Assume that U is a random variable defined on 
{!,...,£}. Let \E' = {ipiy4'2, ■ ■ ■ , i'e) and let 9 = Yll=i i'i where ipi and 9 are positive integers. 
Let A' be a random variable such that 

if 1< j < Vi, 
otherwise. 

Let A = YhJ^^ tpi + A'-l, R be uniformly distributed on the set {0, 1, . . . , 6*- 1} and X = A + R 
mod 9. The so defined cipher system (R, U,X) is called the partition code 
Note that one-time pad is a special case of partition code when ^ = (1,1,...,!). 

It can be proved directly that a partition code satisfies © - dV]) and hence is an EPS system. 
Furthermore, we can verify that 

H{X) = H{R) = \og9, (64) 

and 

^ 9 

I{R;UX) = J2Pu{^hg-, (65) 

i=i 

where dM) is from H{X \ U, R) = H{A \ U, R) = J^Li Pu{i) ^ogtpi together with dH. 

Let Qu be the probability distribution such that Quii) = i>i/9. Then (|65l) can be rewritten as 

I[R;UX) = H{U) + D{Pu\\Qu): (66) 

where -D(-||-) is the relative entropy [4J. Consequently, we have the following theorem. 

Theorem 8: Suppose the probability mass Puii) is rational for all i = Let 9 be 

an integer such that 9 ■ Pu{i) is also an integer for all i, and let \E' = {tpi,ip2, ■ ■ ■ ,tpi) with 
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ipi = ■ Puif)- Then the EPS system (_R, U, X) induced by the partition code achieves the 
lower bound in ([561), namely /(i?; UX) = H{U). 

In the following theorem, we prove that if the source distribution Pu is not rational, then 
partition code will not achieve zero key-excess with finite X or R. Its proof is deferred to 
Section IVll 

Theorem 9: Suppose U, X, and R satisfy Q - ([T]) and (1631) . If there exists u E U such that 
Pu{u) is irrational, then the support of X and R cannot be finite. 

Although it is difficult to construct codes satisfying © - ([7]) and (|63l) for Pjj having irrational 
probability masses. Theorem |7] still gives a tight bound on I{R;UX) as shown in the following 
theorem. 

Theorem 10: Suppose the support of Pu is a finite set of integers {!,...,£}. Let \E' = 
{^pl,.. .^^pe+i) with 



Assume that 9 is large enough such that [Pu{i)0\ > 1 for all 1 < i < i. For the partition code 
C(^), I{R; UX) -> H{U) SLsO^oo. 

Proof: Consider a probability distribution Qu with Qu{i) = for 1 < i < ^ + 1. As 
6 ^ oo, Qu converges pointwise to Pu and hence D{Pu\\Qu) — >■ for finite ^. The theorem 



In addition to minimizing the key consumption I{R; UX), we may also want to simultaneously 
minimize H(X), which is the number of channel uses required to convey the ciphertext X. The 
following theorem and corollary illustrate that the zero key-excess condition can be very harsh, 
requiring the EPS system to have a very large H(R) and H{X), even for very simple sources. 

Theorem 1 1 (EPS systems with minimal I{R;UX)): Let X, IZ and U be the respective 
supports of random variables X, R, and U satisfying ©-([T]) and (|63l) . Then 




thus follows from (l66l) . 



m.ajcPx{x) < rain Pu{u) 



(67) 



and 



max PR(r) < min Pu{u). 



(68) 
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Proof: Consider any u E U and x G A". By definition, Pu{u) > and Px{x) > 0. 
From Q, we have Puxiu,x) = Pu{u)Px{x) > 0. Consequently, there exists r eTZ such that 
PuxR{u,x,r) > 0. Notice that 

Puxniu, X, r) = Pxnix, r) (69) 
= Px(a:)Pi?(r), (70) 

where diH) is due to © and (fTOl) is due to ([631). On the other hand, 

PuxRiu,x,r) < PuB{u,r) (71) 
= Pu{u)PR{r), (72) 

where (1721) is due to Q. Finally, as PR{r) > 0, we have Px{x) < Pu{u) and (|67] ) follows. Due 
to the symmetric roles of X and R, the theorem is proved. ■ 
The results in Theorem [TT] are used to obtain bounds on H{X) and H{R) in the following 
corollary. Define the binary entropy function, /i(7) = — 7log7 — (1 — 7) log(l— 7) for < 7 < 1 
and h{0) = h{l) = 0. 

Corollary 12: Let X, IZ and U be the respective supports of random variables X, R, and U 
satisfying © - © and ([63]). Then 

mm{H{X),H{R)} >/i(7r [tt-^J ) + vr [tt-^J logLvr-^J (73) 

>log-, (74) 
vr 

where vr = min^g^^ Pu{u) and the right sides of (1731) and (1741) are equal if and only if tt^^ is an 
integer. 

Proof: From (|671) . max^-g^t" < minug^^ P;7(m). Together with [|7l Theorem 10], this 
establishes (fTSl) . To prove (1741) . we first consider the case when vr^^ is an integer. Then 

h{n [tt'^J ) + TT log [tt'^J =/i(l) + tttt"^ log - 

71 

= log-. 

TT 

If TT^^ is not an integer, then 

1 — 7r[7r~"^J < TT. 
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Hence, 

/i(7r[7r-^J) + 7r[7r-^J logLvr-^J 
= TT [vr-ij log + (1 - TT [vr-ij ) log + TT [tt-iJ logLvr-iJ 

> 7r[7r"-^J log- + (1 - 7r[7r"^J) log- 
vr 71 

= log-. 

vr 

Furthermore, the right hand sides of (fTSl) and (1741) are equal only if vr^^ is an integer. This proves 
the lower bounds on H{X). Due to the symmetric roles of X and R, the theorem is proved. ■ 
Suppose Pu is not uniform so that mm.u<^u Pij{u) < \U\^'^. In this case, (1741 ) shows that 

min{i/(X), H{R)} > log \U\. (75) 

Comparing with (ITTl) and ([T2l) in Theorem [H a larger initial key requirement and a larger number 
of channel uses are required for systems which achieve the minimal expected key consumption. 
The following theorem shows that the lower bounds in (1741) can be achieved for certain Pu 
including the uniform distribution and -D-adic distributions, Pu{u) = Z^^* for certain integers D 
and i. 

Theorem 13: Let U = {1, . . . ,i} and let Pu{^) < Puij) for 1 < z < If there exists a set 
of positive integers ^ = {^j} such that Pu{i) = i^iPui^) for 1 < i < £, then the partition code 
C(\E') simultaneously achieves the minimum H{X) and H{R) among all EPS systems achieving 
minimal key consumption. 

Proof: Suppose {R, U, X) satisfies ^ - & and ([63]) so that H{X) > log from (1741) . 
Note that Pu{i) = (ZlLi^*)^ from the definition of \1'. Therefore 

H{X) > log . (76) 

The partition code has 9 = Yli=i''Pi ^^^^ '-^^ achieve equality in (1761) from (|64l) . 

Similarly, we can argue that the partition code C(\&) achieves the minimum H{R). ■ 

For some other source distributions Pu, the partition code may not achieve the minimal number 
of channel uses H{X), as illustrated in the following example. 

Example 4: Consider an EPS system (i?, t/, X) such that 

1) f/ is a binary random variables where -Pc/(0) = 3/5. 
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2) X and R take values from the set {0, 1, 2, 3}. 

3) Px(0) = Pr(0) = 2/5, Px{i) = Pr{i) = 1/5 for i = 1,2,3. 

4) /(X; i?) = so that PxR^xr) = Px{x)PR{r) for all x and r. 

5) f/ is a function of (X, i?) such that f/ = if and only if (i) X = and 7^ 0, or (ii) 
i? = and X 7^ 0, or (iii) X = i? 7^ 0. Consequently, Pu\xr{u \ x, r) is well-defined. 

It is straightforward to check that {U,X,R} satisfies © - © and ^ and H{X) = H{R) < 
log 5. However = 5 is the smallest integer such that 9 ■ Pu{u) is an integer. In this example, 
H(X) is smaller than the value given in (|64|) . While Theorem [T3] shows that partition code can 
simultaneously minimize H{X) and H{R) under the conditions ® - © and (|63l) . this example 
shows that partition code is not necessarily optimal in terms of minimizing H{X) for a general 
source. 

B. Minimal number of channel uses 

In the previous subsection, we proposed partition codes C(\l/) which minimize the expected 
key consumption for error free perfect secrecy systems. However, we also demonstrated that these 
codes do not guarantee the minimal number of channel uses H(X), among all other EPS systems 
which also minimize the expected key consumption. Finding an EPS system which minimizes 
the number of channel uses for a given expected key consumption is a very challenging open 
problem. In this subsection, we aim to minimize I(R; UX) in the regime where H(X) meets 
the lower bound in Theorem [H H(X) = log|W|. Unlike in Section |IV-A[ we can completely 
characterize this regime. 

Using Theorem [3l we can show that by using one-time pad, 

H{U) < log |W| = H{X) = H{R) = I{R; UX). 

Therefore, in this instance, the expected key consumption I{R; UX) is not minimal when the 
source U is not uniform. However, the following theorem shows that among all EPS systems 
which minimize the number of channel uses, the one-time pad minimizes the expected key 
consumption. 

Theorem 14: Consider any EPS system {R, f/, X) (e.g., one-time pad) with H{X) = log \V(\. 
Then I{R; UX) = log \U\ and H{X\RU) = 0. 
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Proof: If H{X) = log \U\, Px{x) = l/\U\ fox x e X and 

\X\ = \U\ ill) 

from Theorem [T] Let 

Xru = {x e X : PRux{r, u, x) > 0} 

be the set of possible values of X when R = r and U = u. Due to (O, Xri fl Xrj = if z 7^ j. 
Together with (1771) . 



|w| = |;f I > 



V|;f,„| > |W|min|A^,J. (78) 



On the other hand, for any r E TZ and u eU 



PRUx{r,u,x) = PuR{u,r) = Pu{u)PR{r) > 

from (|7]), and hence, \Xru\ > 1- Substituting this result into (|78l) shows that | A^ul = 1- Therefore, 
X is a function of R and t/, which verifies 

H{X I ?7i?) = 0. (79) 

Together with ([5]) - ©, it is easy to verify that I{R; UX) = H{X) = log \U\. ■ 

C. The fundamental tradeoff 

An important open problem is to find coding schemes which can achieve points on the tradeoff 
curve between Points 1 and 2 in Figure [8l For a given source distribution Pjj and number of 
channel uses H{X) = log|W| + 7, with 7 > we need to solve the following optimization 
problem, 

/(7)= inf /(i?;f/X), (80) 

where 

V, = {Prx\u ■■ U) = /(X; U) = H{U\X, R) = 0, H{X) = log \U\ + 7} (81) 

is the set of feasible conditional distributions yielding an EPS system with the specified number 
of channel uses. 

Solving (l80l) remains open in general, however two important structural properties of /(7) 
are given in the following theorem. 
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Proposition 15: Let Pu and 7 > be given. Then defined in (I8TI ) is non-empty for 7 > 0, 
and /(7) defined in (|80l) is non-increasing in 7. 

Proof: A non- vacuous feasible set is demonstrated as follows. Let (R, U, X) be a given 
EPS system. Define a second EPS system (i?',f/',X') as follows. Let {R',U') = {R,U) and 
X' = (X, A), where A is a random variable independent of [R, U, X) such that H(A) = 5 for 
any given 5 > 0. In other words, {R', U', X') is constructed by adding some spurious randomness 
into the ciphertext of the EPS system {R, U, X). Setting 5 = 7 and supposing that {R, U, X) is 
a cipher system using a one-time pad yields Pr'x'\u' ^ 'P-y- 

By the same trick, we can show that /(7) is non-increasing. For any 7 > and e > 0, let 
{R, U,X) be an EPS system such that Prx\u ^ "^7 and 

J(i?; UX) < /(7) + e. (82) 

It is easy to check that Pr>x'\u' e V^+s and H{X \ UR) = H{X' \ U'R') - 5. Then 

/(7 + S)= inf I{X; UR) (83) 

= inf (HiX)-HiX\UR)) (84) 

= log|W|+7 + 5- sup H{X\UR) (85) 

<\og\U\+-f + 6 - H{X' \U'R') (86) 
= H{X) - H{X I UR) (87) 
</(7) + e, (88) 

where (f88l) follows from (|82l) . Since e > is arbitrary, the second claim of the proposition is 
proved. ■ 



V. Compression before Encryption 

In Section IJ we discussed the standard approach of compression-before-encryption (cf. Fig.O 
suggested by Shannon. In the following, we will show that this approach is not necessarily the 
right way to minimize either I(R; UX) or H(X) in error free perfect secrecy systems. For 
simplicity, all units in this section are in bits and logarithms are with base 2. 



July 10, 2012 



DRAFT 



27 



A central idea in lossless data compression is to encode frequently occurring symbols (or 
strings) using shorter codewords. However, this can cause problems in the context of EPS 
systems. For instance, suppose our cipher consists of a Huffman code followed by a one- 
time pad using a key with the same length as the Huffman codeword. At first glance, this 
approach can reduce both the ciphertext size and the key size to the minimum expected codeword 
length. Unfortunately, this method is not secure because the length of the output discloses some 
information about the message. Consider an extreme case that the message is generated according 
to Puii) = 2^* for 1 < i < i and Pui^) = 2^^^^^\ If a binary Huffman code is used, the message 
is uniquely identified by the length when U < £ — 1. 

This problem can be solved by different methods. One solution has been discussed in 
Example [2l In this section, we only consider the compress-encrypt-pad scheme of Fig. |9l since 
this is sufficient to illustrate the deficiencies of compression before encryption. 

In Fig. [9l a prefix code is used to encode the message U and a codeword with length cr(f/) 
is obtained. The codeword is further encrypted by one-time pad using a key with the same 
length cr([/). After application of the one-time pad, fair bits are appended such that the output 
has a constant length 7 equal to the longest codeword, maXu&A The receiver decrypts the 
message by applying the key bit-by-bit to the ciphertext until a codeword in the prefix code is 
obtained. 



u 


Prefix Code 


(ci, . . 


• :Cct(C/)) 


— ► 





\ 



(ei,.. 


• :eCT((7)) 


Padding 


X = 


/ 


— ► 



(fcl, . . . , fco-(c/)) 



Fig. 9. A compression-encryption-padding scheme. 



In this scheme, the ciphertext X has a uniform distribution so that H{X) = 7. Since 7 is the 
length of the longest codeword and a prefix code is uniquely decodable, 7 > log i, where £ = \V(\. 
Therefore, H{X) > \og£, in agreement with Theorem [TJ This scheme requires an initial key of 
length H{R) > log£ bits providing a sufficiently long secret key in case the longest codeword 
is the one that happens to be generated. 
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Let us now compare the performance of this scheme with the bounds obtained in Section ITV-Al 
where the minimal expected key consumption is assumed. Suppose the Shannon code [HI is used 
in the scheme described in Fig. |9] to construct an EPS system. The performance is given in the 
following theorem. 

Theorem 16: If the Shannon code is used in the compress-encrpyt-pad scheme described in 
Fig. in to construct an EPS system, then 

1' 



H{R) = H{X) 



log- 



TT 



(89) 



which exceeds the lower bound in (|74l ) by no more than 1 bit. Furthermore, the expected key 
consumption exceeds the lower bound (l56l) by no more than 1 bit, 

I{R; UX) < H{U) + 1. 

Proof: Recall that a{u) is the length of the codeword assigned to U = u. Then the longest 
codeword has length equal to 



log- 

TT 



(90) 



where vr = Tohiu^u Pu{u). Recall in Fig. |9] that fair bits are appended to each codeword to 
construct a constant length ciphertext X. Therefore, H{R) = H{X) = [log ^] which is within 
one bit of the lower bound in (1741) . Furthermore, the expected key consumption 



I{R; UX) = H{R) - H{R \ UX) 









log- 


-( 


log- 


TT 







Y,Pu{u)a{u) 



Y,Pu{u)a{u) 



ueu 
< H{U) + 1, 



(91) 
(92) 

(93) 

(94) 



where (|92l) follows from the fact that H{R \ UX) is equal to the number of appended fair bits, 
and dill) follows from BH (5.29)-(5.32)]. Therefore, I{R] UX) is also within a bit of the lower 
bound in ■ 
Therefore, we conclude that if the Shannon code is used for compression in Fig. HI then the 
performance is close to the optimal code in the minimal key consumption regime when both 
H{U) > 1 and log i > 1. 
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Now, we compare the performance obtained when the Huffman code is used in place of the 
Shannon code. In this case, the expected key consumption I{R;UX) can again be analyzed 
similar to (|9T| ) - (|94l ). Since the expected codeword length in (|93l) is shorter for the Huffman 
code, a smaller I(R; UX) can be obtained. However, the longest codeword in the Huffman code 
can be longer than the longest codeword in the Shannon code. As a consequence, larger H{X) 
and H{R) are required for certain Pu. This can be seen in the example in Table IB In the worst 
case, the longest codeword in the Huffman code can be as much as 44% longer than the longest 
codeword in the Shannon code [fTOl . Furthermore, the partition code C(^) in Table H] outperforms 
the compression before encryption schemes based on either the Huffman code or the Shannon 
code because C(^) is optimal according to Theorem [T3l On the other hand, the Shannon code 
uses unnecessarily long codewords for certain source distributions, e.g., Pu = (0.9,0.1). As a 
consequence, larger H(X) is needed as shown in Table HIl However, the minimal I{R; UX) or 
the minimal H{X) can be obtained using different partition codes. We conclude that compression 
before encryption is a suboptimal strategy to minimize key consumption or the number of channel 
uses in EPS systems. 

TABLE I 

Comparing different schemes with $ = (1, 1, 1, 3, 4, 7, 11) and Pu{i) = $(i)/28 for 1 <i <7 





Huffman 


Shannon 


Partition C($) 


I{R; UX) 


2.357 


2.679 


2.291 = H{U) 


H{X) 


6 


5 


5 



TABLE 11 

Comparing different schemes with $ = (9, 1), = (1, 1) and Pu = (0.9, 0.1) 





Huffman 


Shannon 


Partition C(<1>) 


Partition C($') 


I{R; UX) 


1 


1.3 


0.469 = H{U) 


1 


H{X) 


1 


4 


4 


1 



Suppose now that the source distribution is d-adic and the smallest probability mass in Pjj is 
equal to d^^ for certain integers d and k. What were binary digits in the scheme described above 
in Fig. |9] now become d-My symbols. It can be verified that the longest codeword has length equal 
to k. Therefore, both rf-ary Shannon codes and d-ary Huffman codes can achieve the minimal 
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H(X) and H{R) in (IT?! ). Furthermore, the expected codeword length is equal to H{U). By (|93l) . 
I{R;UX) is equal to the expected codeword length, which is equal to H{U). Therefore, the 
minimal /(i?; f/X) is achieved. However, a prefix code cannot achieve the expected codeword 
length H{U) when Pu is not (i-adic [5, Theorem 4.6]. Again consider the example in Table |II] 
where Pu = (0.9, 0.1). Only partition code but neither the Shannon nor the Huffman code can be 
used to achieve I(R; UX) = H{U). Indeed, the (i-adic distribution is just a special case of the 
condition used in Theorem [T3l Therefore, the partition code can achieve the minimal /(i?; UX) 
for a wider range of Pu. 

VL Proof of Theorem [9] 

Suppose there exists u E U such that Pu{u) is irrational. Define a new random variable U* 
such that 

I ifU = u 

u* = I 

I 1 otherwise. 

Then P[/.(0) and Pc/.(1) are irrational. As U* is a function of t/, by © - © and (1631) . 

I{U*; R) = I{U*; X) = I{X; R) = H{U* \ XR) = 0. (95) 

Therefore, it suffices to consider binary U. 

Let X and TZ be the respective supports of X and R. Suppose to the contrary first that \X\ 
and \TZ\ are both finite. We can assume without loss of generality that 

A' = {l,...,n} (96) 

7^ = {l,...,m}. (97) 

Let 

x, = Pxit), t = l,...,n (98) 

rj=PR{j), j = l,...,m, (99) 

and let x be the ra-row vector with entries Xj. Similarly, define the column vector r. 

As X and R are independent and H(U \ XR) = 0, there exists a function g such that 
U = g{X, R). Hence, from X and R we induce a n x m decoding matrix G with entries 

= fiij), z = 1, . . . J = 1, . . . ,m. 
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Then 



Y,Gr.ir, = Pu{l). z = l,...,n (100) 

i=i 

n m 

^a;. = ^r, = l (101) 

i=i j=i 

Xi > 0,rj > 0, i = 1, . . . ,n, j = 1, . . . ,m (102) 

m 

J2^^G^,, = Pu{l), J = l,...,m (103) 

■t=l 

Here, (|100l) is due to the fact that I(U;X) = 0, (IIOII) and (11021) are required since Px and P/? 
are probability distributions, and (11031) follows from /([/; i?) = 0. 

In fact, for any x, r and binary matrix G satisfying the above four conditions, one can construct 
random variables {U,R,X} such that 

I{U; R) = I{U; X) = I{X; R) = H{U \ XR) = (104) 

where U = f{X, R) and the probability distributions of X and R are specified by the vectors x 
and r respectively. 

In the following, we will prove that if the rows of G are not independent, then we can construct 
another random variable X* with support X*, < \X\ such that 

I{U; R) = I{U; X*) = I{X*; R) = H{U \ X* R) = 0. (105) 

To prove this claim, suppose that there exists disjoint subsets A and B of {1,...,?t,} and 
positive numbers ai,i E AU B such that 

= (106) 
ieA keB 
where Gi is row i of G. Then we will claim that 

= y^Qfc- 

ieA keB 

Multiplying both sides of (11061) by r. 



Y,c^iGir = J2''kGkr (107) 

ieA keB 

^a,Pt;(l) = 5^a,Pc/(l) (108) 

ieA keB 

J2ai = J2(^k. (109) 



ieA keB 
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Let e = minjg_4usXj/Q;j. Assume without loss of generality that n E A and that e = Xn/cyn- 
Define 



Xj, otherwise. 

Note that a;* = 0. Suppose that the probability distribution of X is changed such that Pxii) = x*. 
Then it can be checked easily that U, X, R still satisfy Q - ([7]) and (l63l) . Furthermore, the size 
of the support of Px is \X\ < n — 1. 

Repeating this procedure, we can prove that for any random variable U, if there exists auxiliary 
random variables X, R satisfying Q - (|7]) and (|63l) . then there exists auxiliary random variables 
X*, R* such that (1951) is satisfied and the rows and columns of the decoding matrix induced by 
X* and R* are all linearly independent. Hence, the decoding matrix G induced by X* and R* 
must be square (and thus m = n). Consequently, 

n 

^x,G'„ = Pc;(l), j = l,...,n. (110) 

i=l 

There exists a unique solution (zi, . . . , Zn) such that 

n 

^ZiGi^j = l, j = l,...,n. (Ill) 

i=l 

Clearly, Zi = Xi/Pu{l). As all the entries in G are either or 1, all the Zi are rational numbers. 
Therefore, 

n n 

l = J2x^ = Pu{l)J2^^- (112) 

■t=l i=l 

Hence, Pui^) must be rational and a contradiction occurs. We have proved that X and TZ cannot 
be both finite. The case when only A* or 7^ is finite can be similarly proved. 



VII. Conclusion 

This paper studied perfect secrecy systems with zero decoding error at the receiver, with the 
additional assumption that the message U and the secret key R are independent, I(U; R) = 0. 
Under this setup, we found a new bound log \U\ < H(R) on the key requirement, improving on 
Shannon's fundamental bound H(U) < H{R) for perfect secrecy. 
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To transmit the ciphertext X, the lower bound on the minimum number of channel uses has 
been shown to be log \U\ < H{X). If the source distribution is defined on a countably infinite 
support or a support with unbounded size, no security system can simultaneously achieve perfect 
secrecy and zero decoding error. 

We also defined and justified three new concepts: residual secret randomness, expected key 
consumption, and excess key consumption. We have demonstrated the feasibility of extracting 
residual secret randomness in multi-round secure communications which use a sequence of 
error free perfect secrecy systems. We quantified the residual secret randomness as H{R\UX). 
We further distinguished between the size H{R) of the secret key required prior to the 
commencement of transmission, and the expected key consumption J(i?; UX) in a multi-round 
setting. In contrast to H[R) > log \U\, we showed that I{R; UX) is lower bounded by H{U), 
giving a more precise understanding about the role of source entropy in error free perfect secrecy 
systems. The excess key consumption is quantified as I{R; X), and is equal to if and only if 
the minimal expected key consumption is achieved. 

One of the main objectives of this paper was to reveal the fundamental tradeoff between 
expected key consumption and the number of channel uses. For the regime where the minimal 
I{R; UX) is assumed, H{X) and H{R) are inevitably larger and corresponding lower bounds 
for H{X) and H{R) have been obtained. If the source distribution Pu has irrational numbers, 
the additional requirements on the alphabet sizes of X and R to achieve minimal I{R; UX) have 
been shown. We have proposed a new code, the partition code, which generalizes the one-time 
pad, and can achieve minimal I{R; UX) when all the probability masses in Pu are rational. In 
some cases, the partition code can simultaneously attain the minimal H{X) and H(R) in this 
regime. 

At the other extreme, the regime where the minimal number of channel uses is assumed, the 
one-time pad has been shown to be optimal. For the intermediate regime, we have formulated 
an optimization problem for the fundamental tradeoff between I{R; UX) and H{X). We also 
demonstrated that compression before encryption cannot minimize either H{R), H{X) or 
I{R; UX). 

This paper has highlighted a few open problems. First, the complete characterization of the 
tradeoff between I{R;UX) and H[X) remains open. Second, the partition code is only one 
class of codes designed to minimize expected key consumption. Codes achieving other points 
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on the tradeoff curve are yet to be discovered. In particular, a code achieving minimal H{X) 
and H(R) in the regime of minimal expected key consumption is important for the design of 
efficient and secure systems. 
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